Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data
نویسندگان
چکیده
This correspondence presents the first known results of complete recognition of continuous Mandarin speech for the Chinese language with very large vocabulary but very limited training data. Various acoustic and linguistic processing techniques were developed, and a prototype system of a continuous speech Mandarin dictation machine has been successfully implemented. The best recognition accuracy achieved is 92.2% for finally decoded Chinese characters.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملCharacteristics of Chinese language models for large vocabulary telephone speech
This paper is concerned with language modeling (LM) for large vocabulary speech recognition in Mandarin Chinese. As the language characteristics of Chinese are quite unique, we investigate some novel techniques in language modeling. We also borrow some of techniques that have been applied to other languages. Experiments have been conducted on the Call Home Mandarin, HUB4, and HUB5 corpora obtai...
متن کاملImproved search strategy for large vocabulary continuous Mandarin speech recognition
This paper presents a new search strategy for large vocabulary continuous Mandarin speech recognition considering the special structure of Chinese language. This strategy is composed of a forward and a backward passes, between which a high-quality syllable lattice is generated to bridge the syllable-level and word-level decoding processes. In the forward pass, considering the small number of sy...
متن کاملImproving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors
It has been well-recognized that the accent has a great impact on the ASR of Chinese Mandarin, therefore, how to improve the performance on the accented speech has become a critical issue in this field. The attribute feature has been proven effective on modelling accented speech, resulting in a significantly improved performance in accent recognition. In this paper, we propose an attribute-base...
متن کاملImproved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model
The incorporation of prosodic information in large vocabulary continuous speech recognition has attracted much attention in recent years, especially for a tonal language such as Mandarin Chinese. The tones of some syllables are very difficult to recognize correctly due to the very complicated prosodic behavior. Tone recognition errors inevitably degrade the recognition accuracy seriously. We pr...
متن کامل